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Abstract 



The Johnson-Lindenstrauss Lemma is a classic result which implies that any set of n real 
vectors can be compressed to O(logn) dimensions while only distorting pairwise Euclidean 
distances by a constant factor. Here we consider potential extensions of this result to the 
compression of quantum states. We show that, by contrast with the classical case, there does 
not exist any distribution over quantum channels that significantly reduces the dimension of 
quantum states while preserving the 2-norm distance with high probability. We discuss two 
tasks for which the 2-norm distance is indeed the correct figure of merit. In the case of the trace 
£N| ■ norm, we show that the dimension of low-rank mixed states can be reduced by up to a square 

root, but that essentially no dimensionality reduction is possible for highly mixed states. 

(N ' 

\o 

(S| ■ 1 Introduction 

The Johnson-Lindenstrauss (JL) Lemma [19] is a dimensionality reduction result which has found 
a vast array of applications in computer science and elsewhere (see e.g. [17, 18, 21]). It can be 
stated as follows: 

Theorem 1 (Johnson-Lindenstrauss Lemma [19]). For all dimensions d, e, there is a distribution 
V over linear maps £ : R rf — > R e such that, for all real vectors v, w, 

C3 ; 

Pr [(1 - e)||u - w\\ 2 < \\£{v) - £{w)\\ 2 < \\v - w\\ 2 ] > 1 - exp(-ft(e 2 e)), 

where || • ||2 is the Euclidean (£ 2 ) distance. The lemma is usually applied via the following corollary, 
which follows by taking a union bound: 

Corollary 2. Given a set S of n d-dimensional real vectors, there is a linear map £ : M. d — > 
jg)0(iogn/e ) p reserves a n Euclidean distances in S, up to a multiple of 1 — e. Further, there is 
an efficient randomised algorithm to find and implement £ . 

There are several remarkable aspects of this result. First, the target dimension does not depend 
on the source dimension d at all. Second, the randomised algorithm can be simply stated as: choose 
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a random e-dimensional subspace with e = 0(logn/e 2 ), project each vector in S onto this subspace, 
and rescale the result by a constant that does not depend on S. Third, this algorithm is oblivious: 
in other words, £ does not depend on the vectors whose dimensionality is to be reduced. 

More generally, let £p be the vector space M rf equipped with the i p norm || • A randomised 
embedding from ii 1 to l e „ with distortion 1 1/(1 — e) and failure probability 5 is a distribution V over 
maps £ : R rf — > M e such that, for all v , w € M. d , 

Pj v [(1 - e)||« - HI, < \\£(V) ~ £W\\p < \\v - w\\ P ] >l-6. 

This definition does not allow the distance between vectors to increase; such embeddings are called 
contractive. The JL Lemma states that there exists a randomised embedding from if, to l\ with 
distortion 1/(1 — e) and failure probability exp(— Q(e 2 e)). Another natural norm to consider in 
this context is l\. In this case the situation is less favourable: it has been shown by Charikar 
and Sahai [11] that there exist 0(d) points in if such that any linear embedding into i\ must 
incur distortion Q(^d/e). Brinkman and Charikar later gave a set of n points for which any (even 
non-linear) embedding achieving distortion D requires n^( 1//£>2 ) dimensions [9]. 

1.1 The JL Lemma in quantum information theory 

The JL Lemma immediately gives rise to a protocol for quantum fingerprinting [10], or in other 
words efficient equality testing. Imagine that Alice and Bob each have an n-bit string, and are 
required to send quantum states of the shortest possible length to a referee, who has to use these 
states to determine if their bit strings are equal (this is the so-called SMP, or simultaneous message 
passing, model of communication complexity [20]). Associate each bit string with an orthonormal 
basis vector of M?" . Then the JL Lemma guarantees that there exists a map from M? n into M° ( - Tl ) 
such that the inner products between all of these 2 n vectors are preserved, up to a small constant. 
So Alice and Bob each simply apply this map to their vectors, renormalise the output (which makes 
very little difference to the inner products), and send the O(logn) qubit states corresponding to 
the resulting 0(n)-dimensional vectors to the referee, who applies the swap test to the states [10]. 
Given two states \(j>), this test accepts with probability \ + ^|(^|</))| 2 . As the inner products 
are approximately preserved by the map into M°( n ), the referee can distinguish between the two 
cases of the states he receives being equal or distinct, with constant probability. 

More generally, Alice and Bob can use a similar SMP protocol to solve the following task: given 
quantum states \ipA), \^b), each picked from a set of k states, determine (?Pa\iPb) up to a constant. 
Whatever the initial dimension of the states, the JL Lemma (strictly speaking, an easy extension of 
the JL Lemma to complex vectors) guarantees that they can be compressed to 0(log k) dimensions 
with at most constant distortion, implying that the referee can estimate (VuIV'b) up to a constant 
using only 0(loglog/c) qubits of communication. 

However, there is a problem with this protocol. While it is oblivious in the sense that it does 
not depend on the k states which are given as input, it is not oblivious in the following quantum 
sense: Alice and Bob each need to know what their states are in order to apply the embedding 2 . 
One would expect the right quantum analogue of a randomised embedding to map quantum states 
to quantum states in an oblivious fashion. Such an algorithm can be expressed as a distribution 
over quantum channels (completely positive, trace preserving (CPTP) maps [23, 25]), which are 
the class of physically implementable operations in quantum theory. 

We use this somewhat clumsy definition of distortion for consistency with prior work. 

2 On the other hand, if the unphysical operation of postselection is allowed, the JL Lemma can be applied directly. 
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Let B(d) denote the set of d-dimensional Hermitian operators. The distance between quantum 
states p, a G 13(d) can be measured using the Schatten p-norm \\p — a\\ p , which is defined as 
\\ x \\p = (Ei |Ai(X)H 1/p , where Xi(X) is the i'th eigenvalue of X. The case p = 1 is known as the 
trace norm, and p = 2 is sometimes known as the Hilbert-Schmidt norm. We have the following 
definition. 

Definition 1. A quantum embedding from S C 0(d) to 0(e) in the Schatten p-norm, with distortion 
1/(1 — e) and failure probability 5, is a distribution V over quantum channels £ : B(d) — > 0(e) such 
that, for all p, a £ S, 

Pr [(1 - e)\\p - a\\ p < \\£(p) - £(a)\\ p < \\p - a\\ p ) > 1 - 5. 

Rather than only considering embeddings that succeed for all states in B(d), we generalise the 
definition to subsets of states. An interesting such subset is the pure states, for which one might 
imagine stronger embeddings can be obtained. Indeed, a closely related notion has been studied 
before by Winter [26], and more recently Hayden and Winter [16], under the name of quantum 
identification for the identity channel. In this setting, the sender Alice has a pure state l^) £ C d 
and the receiver Bob is given the description of a pure state \(f>) 6 C d . Alice encodes her state 
\ip) as a quantum message using a quantum channel £ : B(C d ) — > 0(C e ) and sends it to Bob, who 
performs a measurement (D^^I — D^) on the message. The goal is to obtain approximately the 
same measurement statistics as if Bob had performed the measurement (|(/>)(c/>|, I — \(j))((j)\) on \ifj): 

V |^>, |0>, |tr[^f(|^)(^|)]-|(^)| 2 |<€. 

Winter showed in [26] that, for constant e, this can be achieved with e = 0(y/d); note that the 
resulting states £(\ip)(^\) are highly mixed. Winter's result allows the development of a one-way 
protocol for testing equality of n-bit strings using ^ log 2 n + 0(1) qubits of communication from 
Alice to Bob, which is still the best known separation between one-way quantum and classical 
communication complexity for total functions [1]. In our terminology, the result of [26] shows that 
there exists a quantum embedding from B(d) to B(0(Vd)) that approximately preserves the trace 
distance between (initially) pure states. But note that one aspect of Winter's result is stronger than 
we need: he showed the existence of a channel such that the distance is approximately preserved 
between all pairs of states. Here, we are interested in finding distributions T> over channels £ such 
that, for an arbitrary pair of states, the distance is approximately preserved with high probability; 
this is potentially a weaker notion. In particular, it is not necessarily true that the individual channel 
obtained by averaging over T> will preserve the distance between an arbitrary pair of states. 

We pause to mention that the JL Lemma has found some other uses in quantum information 
theory. Cleve et al [12] used it to give an upper bound on the amount of shared entanglement 
required to win a particular class of nonlocal games. Gavinsky, Kempe and de Wolf [14] used it 
to give a simulation of arbitrary quantum communication protocols by quantum SMP protocols 
(with exponential overhead). Embeddings between norms have also been used. Aubrun, Szarek 
and Werner [4, 3] have used a version of Dvoretzky's theorem on "almost-Euclidean" subspaces of 
matrices under Schatten norms to give counterexamples to the additivity conjectures of quantum 
information theory. And, very recently, Fawzi, Hayden and Sen [13] have used ideas from the 
theory of low-distortion embeddings of the "^1(^2)" norm to prove the existence of strong entropic 
uncertainty relations. 
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1.2 Our results 



In this paper, we show that the dimensionality reduction that can be achieved by quantum embed- 
dings is very limited. We begin, in Section 2, by considering the Schatten 2-norm (which is just 
the vector 2-norm on matrices). We show that, in stark contrast to the JL Lemma, any quantum 
embedding which preserves the 2-norm distance between (say) orthogonal pure states with con- 
stant distortion and constant failure probability can only achieve at most a constant reduction in 
dimension. 

One potential criticism of this result is that the 2-norm is not usually seen as a physically 
meaningful distance measure, as compared with the trace norm. However, we argue in Section 
3 that for certain problems the 2-norm is indeed the correct distance measure. We discuss two 
problems - equality testing without a reference frame and state discrimination with a random 
measurement - where the 2-norm appears naturally as the figure of merit. 

In Section 4 we turn to the trace norm, for which we have upper and lower bounds. On the 
upper bound side, we extend the result of Winter [26] to show that low-rank mixed states are 
also amenable to dimensionality reduction; roughly speaking, <i-dimensional mixed states of rank r 
can be embedded into 0(Vrd) dimensions with constant distortion. On the other hand, we show 
using the 2-norm lower bound that highly mixed states cannot be embedded into low dimension: 
there is a lower bound of Q,(\fd * ) on the target dimension of any constant distortion trace 
norm embedding that succeeds with constant probability for the pairs UpU^, UaU^ for all unitary 
operators U. In particular, this implies an £l(y/d) lower bound for any embedding which succeeds 
for a unitarily invariant set of states. In the case that \p — cr\ is proportional to a projector (i.e. all 
non-zero eigenvalues of p — a are equal in absolute value), our upper and lower bounds coincide. 

Finally, some notes on miscellaneous notation. will denote the unitary operator which swaps 
(or flips) two d-dimensional quantum systems (i.e. Fd = J2ij=i K)01 ® b')(*|)j an d Id wm denote 
the d-dimensional identity matrix. Whenever we say that U G U(d) is a random unitary operator, 
we mean that U is picked uniformly at random according to Haar measure on the unitary group 
U(d). 

2 Dimensionality reduction in the 2-norm 

We now show that quantum dimensionality reduction in the 2-norm is very limited. 

Theorem 3. Let T> be a distribution over quantum channels (CPTP maps) £ : B(C d ) — > B(C e ) 
such that, for fixed quantum states p ^ a and for all unitary operators U £ U(d), 

Pr \\\£{UpU ] ) - £{UaU ] )\\ 2 > (1 - e)\\U P U ] - UaU ] \\ 2 ] >l-S 

for some < e,S < 1. Then e > (1 - 5)(1 - efd. 

Note that the above lower bound on target dimension holds for any embedding of a unitarily 
invariant set of states. For example, taking p and a to be orthogonal pure states and inserting 
e = 5 = recovers the (unsurprising) result that any embedding that exactly preserves distances 
between all orthogonal pure states with certainty must satisfy e > d. More generally, if we have 
an embedding which succeeds with constant probability and has constant distortion, the target 
dimension can be no smaller than Q(d). In order to prove the theorem, we will need the following 
two technical lemmas, which are proved in Appendix A. 
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Lemma 4. Let £ : B(C d ) — > B(C e ) be a quantum channel (CPTP map). Then 

tr[F e £® 2 (F d )]<de. 
Lemma 5. Let p and a be d-dimensional quantum states. Then 



U® 2 (p-a)® 2 (U^® 2 dU 



\p-<7\\2 



d 



d 2 -l 

The following lemma is the key to most of the results in this paper. 
Lemma 6. Let p and a be quantum states and let £ : B(C d ) — > B(C e ) be a quantum channel. Then 



J \\£{U P U^)-£{UaU ] )\\ 2 2 dU < 



d(e 2 



1) 



e(d 2 - 1) 



\P-<r\\2- 



Proof. We have 

\\£{UpU ] ) -£{UaU^)\\ 2 2 dU 



\\£{U{p-a)U^)\\ 2 dU 
tv[F e £(U(p-a)U^f 2 ] dU 



tr 



d 2 - 1 



tr 



[/® 2 (p- CT )® 2 (C/t)®2 du 
F P £® 2 (F d - T -f 



< 



< 



d 2 

d{e 2 - 1) 
e(d 2 - 1) 



^(de-dtr[£(I d /d) 2 ]) 



\p-<rh- 



We use linearity of £ in the first equality, and the second equality is the tensor product trick 

tr[X 2 ] = tr^X® 2 ] for e-dimensional operators X. The fourth equality is Lemma 5, the first 

inequality is Lemma 4, and the second inequality is simply tr p 2 > 1/e for all e-dimensional states 
p. ■ ^ . q 

We are finally ready to prove Theorem 3. 

Proof of Theorem 3. We will prove something slightly stronger: that for a random U, the 2-norm 
is not approximately preserved under a map £ picked from T>, unless e is almost as large as d. So 
assume 



Pr 



\\£{UpU^) - £{UaU^)\\ 2 > (1 - e)\\U P W - UaU^\\ 2 



> 1-6, 



or equivalently 



Pr 

£~T>,U£U(d) 



\\£(UpUi)-£(U<rrf)\\ 2 2 >(l 



> 1-6, 



where we use the unitary invariance of the 2-norm. By Markov's inequality, this implies that 



\\£(Uptf) - £{Uo-tf)\\l dU>(l- 5)(1 - e) 2 \\p - a\\ 2 , 
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implying in turn that there must exist some £ such that 

\\£ (Uprf) - £(UaU^\\ 2 dU>(l- 5)(1 - e) 2 \\p - a\\ 2 . 

So let £ : £>(C d ) — > £>(C e ) be a quantum channel that does satisfy this inequality. Then we have 

(1 - <5)(1 - ef\\p - < J \\£{UpU^) - £(Uatf)f 2 dU < (i) \\p - a\\ 2 , 

where the second inequality follows from Lemma 6, assuming that e < d. We have shown that 
e > (1 — 6)(1 — e) 2 d, completing the proof of the theorem. □ 

3 Operational meaning of the 2-norm 

In this section, we discuss the meaning of the 2-norm distance between quantum states. It is usually 
assumed that the trace norm is the "right" measure of distance between states, and proofs going via 
the 2-norm usually do so only for calculational simplicity. However, here we argue that the 2-norm 
is of interest in its own right, by giving two operational interpretations of this distance measure. 

3.1 Equality testing without a reference frame 

Consider the following equality-testing game. We are given a description of two different states p 
and a. An adversary prepares two systems in one of the states p<3 p, cr (g) a, p (g) a or a % p, with 
equal probability of each. He then applies an unknown unitary U to each system (i.e. he applies 
U (g> U to the joint state). Our task is to determine whether the two systems have the same state 
or different states. This models equality testing in a two-party scenario in which the preparer and 
tester do not share a reference frame [5]. One protocol for solving this task is simply to apply the 
swap test [10] to the two states we are given, output "same" if the test accepts, and "different" 
otherwise. When applied to two states p, a this test accepts with probability ^ + |tr pa, so for 
any U the overall probability of success is 

- (- + -tr|p 2 l | + - | - + -trfcr 2 ] ) + - | -tr[pd ) = - + — Mo — cr||2. 

4 V2 2 LP 7 4 V2 2 L 7 2 V2 2 Vl S J 2 8 IIP 112 

Using our previous result, we now show that this is optimal. 

Theorem 7. The maximal probability of success of the above game is | + ^\\p — cr\\?,. 



Proof. Let (M, / — M) be an arbitrary POVM where the operator M corresponds to the answer 

2 + \' 



"same". Then the probability of success achieved by this POVM for a given U is \ + i-B, where B 



is the bias, which is equal to 
tr 



M [\{UpU j (8) UpU^ + UaU^ UaU^) - -(£/ P U ] ® UaU^ + UaU ] UpU r ) 

If the adversary adopts the strategy of picking U uniformly at random, the average bias obtained 
is 



1 

2 tr 



M / U m {p ® p + a®a - p®o -a® p){U^)® 2 dU 



1 




= - tr 


Mj 


2 





M / U® 2 {p-a)® 2 {tfy 
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which by Lemma 5 is equal to 



2{d? - 1) 



tr 



This expression is maximised by setting M equal to a projector onto the subspace spanned by the 
eigenvectors of Fd — 

I 



with positive eigenvalues. As Fd has d(d + l)/2 eigenvalues equal to 1, and 



d(d — l)/2 eigenvalues equal to —1, we obtain tr 



M Fa 



(d 2 - l)/2. This implies that 



the average bias is at most jHp — As the worst-case bias can only be lower, this implies the 



claimed result. 



□ 



3.2 Performing a random measurement 

The second game we will discuss is state discrimination with a fixed or random measurement. 
Imagine we are given a state which is promised to be either p or a, with equal probability of each, 
and we wish to determine which is the case. It is well-known that the largest bias achievable by 
choosing an appropriate measurement is ~ a \\i (recall from the previous section that the bias 
B and the success probability p have the relationship p = ^ + y )• But how well can we do if the 
measurement we apply does not in fact depend on p and cr? 

We will see that \\p — a\\2 is closely related to the optimal bias achievable by performing one of 
the following two measurements, and deciding whether the state is p or a based on the outcome. 

• The uniform (isotropic) POVM whose measurement elements consist of normalised projectors 
onto all states 

• A projective measurement in a random basis (i.e. applying a random unitary operator and 
measuring in the computational basis). 

In general, the largest bias achievable by measuring a POVM M which consists of measurement 
operators Mi can be written as 

i J>r[Mi(p-<7)]|. 

i 

Each measurement operator of the uniform POVM is given by the projector onto some state 
normalised by a factor of d (to check that this is right, note that 

d [ dibU)Ub\=d( I 4] =I d 



as expected). So the bias induced by the uniform POVM is 

' dip\{ifj\(p - a 

In the case of a measurement in a random basis U 6 U(d), we can calculate the expected bias as 
follows: 

d , d 



±E V £ \(i\uHp - a)U\i}\ = ^EulWHp - e)U\i)\ = ^E ff |(l|E7+(p - a)U\l)\ 

d 



2^ ni ' ' 1 " 2 

1=1 i=l i=l 
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so these quantities are the same. They are also closely related to the 2-norm distance, as we will 
now see. 

Theorem 8. Let p, a be d- dimensional quantum states. Then 

^||p-0-|| 2 < dj dip\{i>\(p - a) |^)| < \\p-a\\ 2 . 

The lower bound in Theorem 8 was shown by Ambainis and Emerson [2] (see also the proof of 
Matthews, Wehner and Winter [22]), and the upper bound is not hard. However, as this result does 
not appear to be widely known, we include a proof (which is essentially the same as that of [22]) 
in Appendix B. 

In fact, the corresponding upper and lower bounds on the bias hold for any fixed POVM whose 
measurement vectors form a 4-design [2], and the upper bound even holds for any fixed POVM 
whose vectors form a 2-design. This result can be useful in cases where one wishes to perform state 
discrimination without necessarily being able to construct the optimal measurement efficiently [24]. 
See the work [22] for much more detail on the bias achievable in state discrimination with fixed 
measurements. 



4 Dimensionality reduction in the trace norm 

In this section we consider embeddings that reduce dimension while preserving the trace norm 
distance between states. As no quantum channel can increase this distance, we first observe that 
any such embedding will automatically be contractive. 

4.1 Upper bound 

It was previously shown by Winter [26] that, in our language, cf-dimensional pure states can be 
embedded into B(0(y/d)) with constant distortion. We now extend this result to general mixed 
states, by showing that rank r mixed states can be embedded into dimension 0(Vrd) with constant 
distortion. 

The embedding is conceptually very simple: apply a random unitary and trace out a subsystem. 
However, when the target dimension e does not divide d, we are forced to consider random isometries 
V : C d — > C e <8> C^/ 6 ! instead of unitaries, where \x] is the smallest integer y such that y > x. 
Recall that an isometry is a norm-preserving linear map, i.e. a map taking an orthonormal basis 
of one space to an orthonormal set of vectors in another (potentially larger) space. A random 
isometry is defined as a fixed isometry followed by a random unitary. 

Formally, our embedding is a distribution over the following quantum channels £y. 

Definition 2. Let d and e be positive integers such that e < d. For any isometry V : C d — > 
C e (g) C^ d ' e \ let £y : B(C d ) —> B(C e ) be the quantum channel that consists of performing V, then 
tracing out (discarding) the second subsystem. 

We now analyse the performance of the embedding obtained by picking a random V and applying 
this channel. 

Theorem 9. Let d be a positive integer, and let p and a be arbitrary d-dimensional mixed states 
such that p has rank r. Fix e such that < e < 1. For any e such that 2^/rd/e < e < d, letV be the 



8 



distribution on channels Sy : B(C d ) — > B(C e ) that is uniform on isometries V : C d — > C e ® C^/ e ^ . 
Then 

Pr J\\£v(p) ~ £v(o-)\\i > (1 - e)||p - alk] > 1 - d exp(-tfed), 

for a universal constant K which may be taken to be (1 — In2)/(21n2) ~ 0.22. 

In order to prove this theorem, we will need the following technical lemma, which is proven in 
Appendix C. 

Lemma 10. Let % = Ha ®T~Lb be a finite-dimensional Hilbert space decomposed into subsystems 
A and B. For any projector P onto a subspace ofH, let P 1 - = I — P be the projector onto the 
orthogonal subspace, and let D be the projector onto the support o/tr^P. Then, for any \tp) £ T~L, 



tr[(D ® I)P ± \^)(^\P ± ] < tr[(D ® tr[P J 

We will also need the following useful result of Bennett et al [6] (see also [26] ) . 

Lemma 11. Let \tp) be a d-dimensional pure state, let P be the projector onto a t-dimensional 
subspace ofC d , and let U £ U(d) be picked according to Haar measure. Then, for any 5 > 0, 



Pr 

u 



\x[UPU^\if>){if)\] > (1 + *)^ 



< exp(-t(tf - ln(l + <5))/(ln2)). 



Proof of Theorem 9. We will upper bound the probability of the embedding failing, i.e. 

Pr[|| ( g y (p-a)[| 1 <(l- e )||p-a[| 1 ]. 

Let S + , S~ be the disjoint sets of indices of (p— er)'s positive and negative eigenvalues, respectively. 
Set s = \S + \, and note that s < rank(p) = r [8, Corollary III. 2. 3]. For a fixed V, expand V(p — a)V* 
as follows: 

V(p-a)V^ = ]T Ai|^)(^| - ^ Mil^X^I 
ies+ ies~ 

for some orthonormal vectors \ipi) £ C e ® C^ d / e ^ and positive coefficients Aj, p. t . Note that 

Yl Xi = Y ^ = \\p- °iii/ 2 - 

icS+ ieS- 
For any states p' and a', it holds that 

||p'- o-'Hi = 2 sup trM(p'-cr'); 

0<M<I 

in a protocol for distinguishing p' and a', M is a measurement operator corresponding to the 
outcome that the state was p' . Thus, in order for it to hold that ||£V(p ~~ °")[|i > (1 ~~ e )l|p ~~ cr]|x, 
it suffices to exhibit an operator M such that < M < L and 

tx[M(£ v (p - a))] > (1 - e)\\p - = (1 - e) £ A,. 

To find such an operator, set 

ies+ 
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Note that Py is the projector onto a random s-dimensional subspace of C e 
be the projector onto the support of tr# Py. Then 



e L Now let D 



v 



tr[D v £ v (p - &)] = Aitr[D v tr B |^)(^|]- ^ vM D V tr B |^)(^ 



(1) 



For all i G S" + , tr[Zfy tr s |V>i>(^|] = 1 3 and for a11 * € S~ , it holds that tr[Py|^i)(^i|] = 0. Aside 
from this constraint, each individual state \tpi), i € S~ , is picked at random and can be expressed 
in terms of a general random state \rj) £C e ®C \ d / e ~\ as 



Hi 



where Py = I — Py and the denominator is non-zero with probability 1. Then 

where the inequality is Lemma 10. For any e such that e > s\d/e\, Dy has rank s[~d/e] with 
probability 1. So, for any such e, Dy (8) I has rank s[~d/e] 2 with probability 1. Applying Lemma 
11, for any 5 > 0, 



Pr 

\v) 



tr[(£V® J)|»7>fa|] >(! + *) 



and hence 



Pr 

r 



tr[(L»y®J)|^)(^|] >(! + *) 



sfd/e] 2 
e\d/e\ 

s\d/e\ 



< exp(-s\d/e} 2 (5 - ln(l + 5))/(ln 2)) 



< exp(-s[d/e] 2 (<5 - ln(l + <5))/(ln2)). 



Using a union bound over S in eqn. (1), for any e satisfying e > s\d/e] it holds that 



Pr 

r 



tr[D v £ v (p-a)} < £ A; - (1 + S) 
We now set 5 



s\d/e\ 



< d exp(-s[d/e] 2 (5 - ln(l + 5))/(ln2)). 



s\d/e] 



ies+ ies- 

1. This gives the following bound, valid when ee > s[~d/e~|: 



nr[tr[£Vfv(p-<r)]<(l-€)||p-<r||i/2] < d e W [ -s\d/e}\ 
i V Vs|d/e 



" 1-ln' " 



s\d/e\ 



/(In 2) 



6C / 66 

< d exp ( -s(d/e) \d/e] ( . , , . - 1 - In 



s\d/e] 

d.xpl-a/li-il^lfi + h/ (( 



sfd/e] 



/(In 2; 



sfd/e] 



/(In 2; 



Now the function f(x) = x(l + ln(l/x)) increases with x in the range < x < 1, so for any e such 
that ^1 < 1/2, we have 

PT[tr[DvSvip-a)]<(l-e)\\p-<x\\i/2] < d exp(-ed(l - /(l/2))/(ln 2)) 

= dexp(-ed(l - In2)/(21n2)). 
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Thus this inequality holds for any e such that ee > 2s\d/e}. As \d/e] <2d/e for e < d, this will 
be satisfied for any e > 2^sd/e, and in particular any e > 2^/rd/e, implying for any such e 

Pr J\\£v(p)-£v{<r)\\i < (I - e)\\p - a^} < d exp(-ed(l - In2)/(21n2)) 
as required. □ 

Although this result is expressed in terms of the rank of the input states, a similar result would 
apply to states which are very close (in trace norm) to having low rank, but for simplicity we do 
not discuss this here. 



4.2 Lower bound 

It turns out that Lemma 6 is also strong enough to give a bound on embeddings of the trace norm, 
via a similar proof to that of Theorem 3. Charikar and Sahai [11] showed that there exist a set 
of O(d) d-dimensional vectors whose dimension cannot be significantly reduced while preserving 
their i\ distances. One might expect the same to be true for the trace norm, as the trace norm 
on diagonal matrices is just the l\ norm of the diagonal entries. However, note that this does not 
follow immediately from Charikar and Sahai's work, as it is conceivable that an embedding mapping 
diagonal to non-diagonal matrices could do better. Nevertheless, we now show that dimensionality 
reduction is impossible for some sets of highly mixed states. 

Theorem 12. Let T> be a distribution over quantum channels (CPTP maps) £ : B(C d ) — > B(C e ) 
such that, for fixed quantum states p ^ a and for all unitary U , 

Pr [\\£(Uprf) - £(UaU^)\\ 1 > (1 - e)\\U pU ] - UaU ] \\{\ >1 — S 
for some < e, 5 < 1. Then 

e > (1 - 8)(1 - e)Vd l P ~ a l X . 

\\P ~ a h 

In particular, if p and a are orthogonal pure states, then e > (1 — 5)(1 — e)V2d, and if p and a are 
proportional to projectors onto orthogonal dj 2- dimensional subspaces, e > (1 — <5)(1 — e)d. 

So we see that achieving any significant dimensionality reduction for arbitrary highly mixed 
states is impossible, and even for pure states the dimension can only be reduced by a square root 
(which was already known [26]). 

Proof. For a randomly chosen U, we have 

Pr J\\£(Uprt) - SiUaU^h > (1 - e)\\Uptf - UaU%] dU > 1 - 6, 

c~X?, U £u (a) 

and use Markov's inequality and the unitary invariance of the trace norm to obtain 

\\£ (Uptf) - £(U<rrf)\\i dU>(l- S)(l - e)\\p - 



Thus there must exist some £ such that 



J \\£(Uptf) - £{UaU^)\\ 1 dU>(l- 5)(1 
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Simply estimating the 1-norm by the 2-norm and using Jensen's inequality, we get the bounds 
(l-<5)(l-e)||p-<7||i < ^J\\£(Uptf)-£(Uatf)\\ 2 dU 



< yfetj \\£{U P tf)-£(Uatf)\\ 2 2 dU 

( 



1/2 



\P ~ C||2, 



where the last inequality follows from Lemma 6, assuming that e < d. Rearranging gives the 
theorem. □ 

This implies that the protocol of Theorem 9 is optimal for certain families of states, up to 
constant factors. Consider the family of pairs UpU^, UaU^ for all U G U(d), where p and a are 
proportional to projectors onto orthogonal r-dimensional subspaces of C . Then 



\P ~ o]|i 
\p-crh 



\J rank(/3 — a) = V2r, 



implying that embeddings of this family with constant distortion and failure probability have a 
lower bound on the target dimension of £l(y/rd), which is achieved by the embedding of Theorem 9. 

5 Conclusions 

We have shown that in the 2-norm, any constant-distortion embedding of a unitarily invariant set of 
c?-dimensional states must have target dimension f2(cZ), in contrast to the classical situation where 
an exponential reduction can be achieved. In the trace norm, the situation is somewhat better: 
d-dimensional states of rank r can be embedded in 0(y/rd) dimensions with constant distortion, 
but there is a lower bound of 0(v / d ||p_^ ) dimensions on any constant distortion embedding that 
succeeds for the pairs of states UpU^ and UaU\ for all unitary U. 

Although the trace distance is often the most physically relevant distance measure to consider, 
we also argued that for certain tasks, the 2-norm distance is in fact the relevant distance measure 
between states. This occurs when the basis in which the states were prepared is unknown or the 
measurement apparatus does not depend on the states to be distinguished. 

The alert reader will have noticed that, in the case where one is interested in embedding a 
unitarily invariant set of states, the embedding might as well start by performing a random unitary. 
Furthermore, as any quantum channel can be represented as an isometry into a larger space followed 
by tracing out a subsystem, this makes any embedding seem somewhat similar to the embedding 
used in Theorem 9. But note that the latter embedding is subtly different, as it can be seen as 
performing a fixed isometry followed by a random unitary, rather than vice versa. Further analysis 
of this embedding might allow the gap between the upper and lower bounds in the trace norm to 
be closed. 

Another open question is whether bounds could be obtained on the possible dimensionality 
reduction when multiple copies of the input state are available. For example, if a very large 
number of copies are allowed, tomography can be performed, the input state can be approximately 
determined, and the JL Lemma applied. Presumably, even for a lower number of copies, stronger 
dimensionality reduction is possible than in the single-copy case. One could also ask whether 



12 



stronger dimensionality reduction can be achieved by allowing some additional classical information; 
for some results in this direction, see [13]. 
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A Lemmas relating to 2-norm embeddings 

We now prove the subsidiary lemmas required for the proof of Lemma 6. 
Lemma 4. Let 6 : B(C d ) — > B(C e ) be a quantum channel (CPTP map). Then 

tr[F e £® 2 (F d )]<de. 

Proof. Assume that 8 has the Kraus (operator-sum) decomposition 

S(p)=Y,A iP Al 

i 

for some ex d matrices Ai such that Y^i A\Ai = Id> an d tr [.Aj-Aj] = if i 7^ j. (Note that 
such a representation does indeed exist, from the unitary freedom in the Kraus decomposition [23, 
Theorem 8.2].) Then write 

tr[F e £® 2 (F d )] = tTY,Fe(A i ®A J )F d (Al®A]) = Y J ^U J ®A l )(Al®A])} 

i,j i,3 

= ^[AjAlMAA]) = J2(ir[AlA}) 2 

< ^2tr[AlAi]j maxtr^]^.] < de. 

The fourth equality uses the orthogonality of the A{ and cyclicity of the trace, and the final 
inequality uses the facts that Yli^Ai = I d and trLAj^Aj] < || AjyliHoo i&nk(A\Ai) < e. □ 

Lemma 5. Let p and a be d-dimensional quantum states. Then 

Proof. For brevity, set r := / U® 2 (p - <r)® 2 {U^)® 2 dU . Because of the averaging ("twirling") over 
the unitary group, r must be a linear combination of the identity and swap operators on the space 
of two (i-dimensional systems [15, Theorem 4.2.10]. To evaluate this, we write r = al d 2 + (3F d and 
calculate 

tr[r] = 0, ix[F d r] = tr[(p - a) 2 ], 

implying that 

ad 2 +pd = Q, ad + (3d 2 = tr[(p - a) 2 }. 
Solving for a and /3 gives the claimed result. □ 
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B Proof of Theorem 8 



We follow the strategy of Matthews, Wehner and Winter [22] to prove Theorem 8. We will use two 
subsidiary results, which are formalised as separate lemmas. 

Lemma 13. Let p, a be d- dimensional quantum states. Then 

,2 tr((P-<7) 2 ] 



*mip-)w- d(d+1) • 

Proof. We use the tensor product trick: 



d^\{p-a)\^Y= / dtptv[(p-ar^)(i>\ m ] =tr 



(p-a) 



I<P + Fd 
d(d+l) 



tr[(p-a) 2 ] 
d(d + l) 



noting that p — a is traceless and that J dil)(\il))(il)\® 2 ) is proportional to the projector onto the 
symmetric subspace of two ci-dimensional systems. □ 

Lemma 14. Let p, a be d-dimensional quantum states. Then 

9tr[(p- a) 2 } 2 



I 



d^\{p-a)\^< 



d(d + l)(d + 2)(d + 3)' 

Proof. This is the same technique as the previous lemma, but is a little more involved. Writing 

d^\{p-a)\^ = tr (p-a)® 4 J <WXVf 

we note that f dtp(\ip)(tp\^ > ' 1 ) is proportional to the projector onto the symmetric subspace of four 
d-dimensional systems, which we write as 

Psym = P<j, 

where S4 is the symmetric group of order 4 and P a is the operator that permutes the 4 sys- 
tems according to the permutation a. Let Cyc(cr) denote the sequence of cycle lengths in a (e.g. 
Cyc((12)(3)) = (2, 1)). Then, for any <i-dimensional operator X, it holds that 

tr[X m P a ] = J] ti[X% 

cGCyc(CT) 

which can be shown diagrammatically or by explicitly writing out the P a matrix. In particular, 
trPo- = (il c y c ( <T )l. Permutations of 4 elements break down into 5 conjugacy classes, as follows: there 
is 1 of the form (1)(2)(3)(4); 6 of the form (12)(3)(4); 3 of the form (12) (34); 8 of the form (123)(4); 
and 6 of the form (1234). 

Thus 

trftsm _ i (d . + ^ + lrf + M) _ ®±m+m±«> , 

implying that 



a£b4 



14 



We can now calculate 

\2i2 



tr 



(p-a) m / #(|V)<Vf 4 ^ 



d(d + l)(d + 2)(d + 3) 



3tr[(p-a) 2 ] 2 + 6tr[(p-a) 4 ]) 



where we use the fact that p — a is traceless to ignore all terms corresponding to permutations with 
fixed points. The upper bound claimed in the statement of the theorem follows by simply noting 
that tr[(p - a) 4 } < tr[(p - a) 2 } 2 . □ 

We are finally ready to prove Theorem 8, which we restate for convenience. 
Theorem 8. Let p, a be d- dimensional quantum states. Then 

\\\p-o-\\2 <d J dip\(ijj\(p - a)\ip}\ < ||p-cr|| 2 . 

Proof. The upper bound is straightforward: 

d [ difiMip - a) |V) | < d ( [ dm(p - o-M) 2 ) 1/2 = d ( tT[ i P ~ a X ] ) ' ^ Hp - *ll2, 



d(d+l) 

where the first inequality is Jensen's inequality, and the equality is Lemma 13. For the lower bound, 
we use the fourth moment method of Berger [7] (which is just Holder's inequality in disguise). This 
states that, for any real- valued random variable X, 

Applying this inequality gives 

* [m^M\ >- /j^tm^Mf; > „ t^pf ( d( ": t T +2) it +3) ) V2 

J (f dip(tp\(p - a)\il)) A ) ' V d(d + l) J V 9ti[(p-a) 2 } 2 J 

by Lemmas 13 and 14, which simplifies to 

, f ,,,/,./ \ i m (d + 2) 1 /2( d + 3)1/2 i 
d y #|(V|(p-a)|V)|>^ 3( d +l) 1IP-^1|2> gllp-^lb 

as claimed. □ 



C Proof of Lemma 10 

We now prove Lemma 10, which we restate for convenience. 

Lemma 10. Let % = Ha ®T~Lb be a finite- dimensional Hilbert space decomposed into subsystems 
A and B. For any projector P onto a subspace ofH, let P 1 - = I — P be the projector onto the 
orthogonal subspace, and let D be the projector onto the support o/tr^P. Then, for any £ H, 

tr[(D J)P ± |V')(V'|P ± ] < ti[(D ® T)\i/>) {i/>\] trfP^)^!]. 
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The key observation which will allow us to simplify this expression is that (D®I)P = P = P{D®I). 
To see this, note that the support of P is contained within the subspace onto which D<g>I projects, 
implying that D <g) I acts as the identity with respect to P. The left-hand side thus simplifies to 
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